Unit 4: Data Cleanse Transforms 


For example, TechTel is not in the default dictionary, so Data Cleanse capitalizes only the first 
letter of the word. However, if you add the word TechTel to your dictionary with a standard for 
firm name use, you can achieve the desired mixed-case results: 


Table 28: Casing Adjustment Results 


TECHTEL, INC. Techtel, Inc. 
TECHTEL, INC. TechTel TechtTel, Inc. 


Ranking and Prioritizing Parsing Engines 





The Data Cleanse transform can be configured to use only specific parsers or a specific 
parser order when dealing with multiline input. You can change the parser order for a specific 
multiline input by modifying the corresponding parser sequence option in the 
Parser_Configuration options group for the Data Cleanse transform. For example, to change 
the order of parsers for the Multilinel input field, modify the Parser_Sequence_Multilinel 
option. 


By default, Data Cleanse parses multiline input using parsers in the following order: 
1. User-defined pattern matching 

. E-mail 

. Social Security number 


2 
3 
4. North American phone number 
5. International phone number 

6 


. Person or firm name 


Hint: 
QO Data Cleanse parser prioritization options can be modified with the ordered 


options window. Carefully selecting which parsers to use in what order can be 
beneficial. Turning off parsers that you do not need significantly improves 
parsing speed and reduces the chances that your data will be parsed incorrectly. 


cas LESSON SUMMARY 
You should now be able to: 


e Use data cleanse transforms 
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